摘要 :
Recently, deep-learning (DL) models have paid a considerable attention to timing prediction in the placement and routing (P (2) a timing optimization model, which uses the inference outcomes in our DL-driven prediction model to en...
展开
Recently, deep-learning (DL) models have paid a considerable attention to timing prediction in the placement and routing (P (2) a timing optimization model, which uses the inference outcomes in our DL-driven prediction model to enable the commercial P&R tools to calculate the full path delays, setting update timing margins on paths, so that the P&R tools should use more accurate margins on timing optimization. Experimental results show that, by using our DTOC framework during timing optimization in P&R, we improve the pre-route prediction accuracy on arc delay and arc output slew by 20~26% on average, and improve the WNS, TNS, and the number of timing violation paths by 50~63 % on average.
收起
摘要 :
Fixing minimum implant area (MIA) violations in the post-route layout is an essential and inevitable task for the high-performance designs employing multiple threshold voltages. Unlike the conventional approaches, which have tried...
展开
Fixing minimum implant area (MIA) violations in the post-route layout is an essential and inevitable task for the high-performance designs employing multiple threshold voltages. Unlike the conventional approaches, which have tried to locally move cells or reassign $V_{t}$ (threshold voltage) of some cells in a way to resolve the MIA violations with little or no consideration of timing constraint, our proposed approach fully and systematically controls the timing budget during the removal of MIA violations. Precisely, our solution consists of three sequential steps: (1) performing critical path aware cell selection for $V_{t}$ reassignment to fix the intra-row MIA violations while considering timing constraint and minimal power increments; (2) performing a theoretically optimal $V_{t}$ reassignment to fix the inter-row MIA violations while satisfying both of the intra-row MIA and timing constraints; (3) refining $V_{t}$ reassignment to further reduce the power consumption while meeting intra- and inter-row MIA constraints as well as timing constraints. Experiments through benchmark circuits show that our proposed approach is able to completely resolve MIA violations while ensuring no timing violation and achieving much less power increments over that by the conventional approaches.
收起
摘要 :
Monolithic 3D integration provides massive vertical integration through the use of nanoscale inter-layer vias (ILVs). However, high integration density and aggressive scaling of the inter-layer dielectric make ILVs especially pron...
展开
Monolithic 3D integration provides massive vertical integration through the use of nanoscale inter-layer vias (ILVs). However, high integration density and aggressive scaling of the inter-layer dielectric make ILVs especially prone to defects. We present a low-cost built-in self-test (BIST) method to detect opens, stuck-at faults (SAFs), and bridging faults (shorts) in ILVs. Two test patterns-all-1s and all-0s-are applied to the input side of a set of ILVs (e.g., making up a bus between two tiers). On the adjacent tier (the output side of the ILVs), the test responses are compacted to a 2-bit signature through space compaction. We prove that this compaction solution does not introduce any fault aliasing. Simulations results using HSPICE and M3D benchmark designs show that the proposed BIST method requires low area overhead and test time, but provides effective fault localization and the detectability of a wide range of resistive faults.
收起
摘要 :
Monolithic 3D integration provides massive vertical integration through the use of nanoscale inter-layer vias (ILVs). However, high integration density and aggressive scaling of the inter-layer dielectric make ILVs especially pron...
展开
Monolithic 3D integration provides massive vertical integration through the use of nanoscale inter-layer vias (ILVs). However, high integration density and aggressive scaling of the inter-layer dielectric make ILVs especially prone to defects. We present a low-cost built-in self-test (BIST) method to detect opens, stuck-at faults (SAFs), and bridging faults (shorts) in ILVs. Two test patterns-all-1s and all-0s-are applied to the input side of a set of ILVs (e.g., making up a bus between two tiers). On the adjacent tier (the output side of the ILVs), the test responses are compacted to a 2-bit signature through space compaction. We prove that this compaction solution does not introduce any fault aliasing. Simulations results using HSPICE and M3D benchmark designs show that the proposed BIST method requires low area overhead and test time, but provides effective fault localization and the detectability of a wide range of resistive faults.
收起
摘要 :
A new trend in complex SoC design is chiplet-based IP reuse using 2.5D integration. In this paper we present a highly-integrated design flow that encompasses architecture, circuit, and package to build and simulate heterogeneous 2...
展开
A new trend in complex SoC design is chiplet-based IP reuse using 2.5D integration. In this paper we present a highly-integrated design flow that encompasses architecture, circuit, and package to build and simulate heterogeneous 2.5D designs. We chipletize each IP by adding logical protocol translators and physical interface modules. These chiplets are placed/routed on a silicon interposer next. Our package models are then used to calculate PPA and signal/power integrity of the overall system. Our design space exploration study using our tool flow shows that 2.5D integration incurs 2.1x PPA overhead compared with 2D SoC counterpart.
收起
摘要 :
A new trend in complex SoC design is chiplet-based IP reuse using 2.5D integration. In this paper we present a highly-integrated design flow that encompasses architecture, circuit, and package to build and simulate heterogeneous 2...
展开
A new trend in complex SoC design is chiplet-based IP reuse using 2.5D integration. In this paper we present a highly-integrated design flow that encompasses architecture, circuit, and package to build and simulate heterogeneous 2.5D designs. We chipletize each IP by adding logical protocol translators and physical interface modules. These chiplets are placed/routed on a silicon interposer next. Our package models are then used to calculate PPA and signal/power integrity of the overall system. Our design space exploration study using our tool flow shows that 2.5D integration incurs 2.1x PPA overhead compared with 2D SoC counterpart.
收起
摘要 :
A new trend in complex SoC design is chiplet-based IP reuse using 2.5D integration. In this paper we present a highly-integrated design flow that encompasses architecture, circuit, and package to build and simulate heterogeneous 2...
展开
A new trend in complex SoC design is chiplet-based IP reuse using 2.5D integration. In this paper we present a highly-integrated design flow that encompasses architecture, circuit, and package to build and simulate heterogeneous 2.5D designs. We chipletize each IP by adding logical protocol translators and physical interface modules. These chiplets are placed/routed on a silicon interposer next. Our package models are then used to calculate PPA and signal/power integrity of the overall system. Our design space exploration study using our tool flow shows that 2.5D integration incurs 2.1x PPA overhead compared with 2D SoC counterpart.
收起
摘要 :
This work addresses a new structure optimization of neuromorphic computing architectures. This enables to speed up the DNN (deep neural network) computation twice as fast as, theoretically, that of the existing architectures. Prec...
展开
This work addresses a new structure optimization of neuromorphic computing architectures. This enables to speed up the DNN (deep neural network) computation twice as fast as, theoretically, that of the existing architectures. Precisely, we propose a new structural technique of mixing both of the dendritic and axonal based neuromorphic cores in a way to totally eliminate the inherent non-zero waiting time between cores in the DNN implementation. In addition, in conjunction with the new architecture we propose a technique of maximally utilizing computation units so that the resource overhead of total computation units can be minimized. We have provided a set of experimental data to demonstrate the effectiveness (i.e., speed and area) of our proposed architectural optimizations: ~2× speedup with no accuracy penalty on the neuromorphic computation or improved accuracy with no additional computation time.
收起
摘要 :
This work addresses a new structure optimization of neuromorphic computing architectures. This enables to speed up the DNN (deep neural network) computation twice as fast as, theoretically, that of the existing architectures. Prec...
展开
This work addresses a new structure optimization of neuromorphic computing architectures. This enables to speed up the DNN (deep neural network) computation twice as fast as, theoretically, that of the existing architectures. Precisely, we propose a new structural technique of mixing both of the dendritic and axonal based neuromorphic cores in a way to totally eliminate the inherent non-zero waiting time between cores in the DNN implementation. In addition, in conjunction with the new architecture we propose a technique of maximally utilizing computation units so that the resource overhead of total computation units can be minimized. We have provided a set of experimental data to demonstrate the effectiveness (i.e., speed and area) of our proposed architectural optimizations: ~2× speedup with no accuracy penalty on the neuromorphic computation or improved accuracy with no additional computation time.
收起
摘要 :
This work proposes a new method of synthesizing asynchronous circuits targeting its practical usability. The key contribution of this work is finding an effective technique of inter-mixing the two design principles namely handshak...
展开
This work proposes a new method of synthesizing asynchronous circuits targeting its practical usability. The key contribution of this work is finding an effective technique of inter-mixing the two design principles namely handshaking based single-rail and timing annotated (i.e., delay insensitive) dual-rail of asynchronous circuits. Precisely, we propose clever ways of partitioning an input (synchronous) circuit to transform it into a circuit with single-rail and dual-rail sub-circuits and of designing seamless interface to stitch the sub-circuits to achieve partial or full combinations of high-performance, low-power consumption, great immunity to delay and noise variability in low-voltage designs, and mitigating side-channel attacks in hardware security.
收起